Abstract
Introduction T-cell redirecting bispecific antibodies (BsAbs) have significantly advanced treatment of relapsed/refractory multiple myeloma (RRMM), offering unprecedented response rates in heavily pretreated patients. However, cytokine release syndrome (CRS) remains a potentially life-threatening toxicity that complicates patient management and may limit treatment accessibility. Severe cases (grade ≥2) require intensive monitoring, hospitalization, and immunosuppressive interventions. Current clinical practice relies on reactive CRS management after onset, lacking validated tools for a priori risk assessment that could guide prophylactic strategies. The ability to accurately stratify patients based on severe CRS risk before treatment initiation would enable personalized monitoring protocols, optimize resource allocation, and potentially expand treatment to outpatient settings for low-risk patients. This study assesses the effectiveness of supervised machine learning approaches for predicting severe CRS in RRMM patients receiving BsAb therapy, with the goal of developing a clinically applicable risk stratification tool.
Methods Patients with RRMM treated with BsAbd were included. Baseline demographic, clinical, laboratory, and cytogenetic data were retrospectively collected for all patients with a database of +100 variables. Missing data were handled through statistical imputation techniques including median and mode. Feature selection employed a robust multi-step approach combining Random Forest (RF) importance, PCA, and Lasso regression within a 5-fold cross-validation framework. Variables selected in ≥4 folds were retained and supplemented with expert hematologist knowledge to create a final set of 25 variables. A RF classifier was trained to predict severe CRS occurrence. Model performance was evaluated using stratified 5-fold cross-validation measuring area under the receiver operating characteristic curve (AUC), sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV). For clinical utility assessment, we derived a post hoc risk stratification system from pooled out-of-sample predictions. Low-risk threshold was defined as the highest cutoff maintaining NPV ≥95% to reliably identify patients unlikely to develop severe CRS. High-risk threshold was optimized to maximize PPV for precise identification of high-risk cases. Model interpretability was assessed using SHAP values to identify key predictive features and understand their relative contributions.
Results A total of 94 RRMM patients were included with a median age of 61 years (range 39–89), treated with either CD3/BCMA (n=75) or CD3/GPRC5D (n=19) BsAbs. Rate of severe CRS occurred in 18 patients (19%). The optimized model achieved a cross-validated AUC of 0.81 ± 07. Performance metrics included a sensitivity of 0.72 ± 0.18, specificity of 0.91 ± 0.08, PPV of 0.70 ± 0.17, and NPV of 0.94 ± 0.04. Using the thresholding strategy, patients were stratified into three risk categories: low-risk (n = 31, 33%), intermediate-risk (n = 56, 60%), and high-risk (n = 7, 7%). The low-risk group had an NPV of 96.77%, with only 1 of 31 patients experiencing severe CRS, supporting its potential role in clinical triage. The high-risk group showed a PPV of 71%, though it represented a small subset and captured only 27% of total CRS cases. SHAP analysis revealed that CRS risk was influenced by a combination of clinical and laboratory variables. The most impactful predictors included those related to hematologic and bone marrow function, renal and electrolyte balance, coagulation status, tissue damage, and patient age. These factors demonstrated that routine lab test values show non-linear contributions to CRS.
Conclusions This preliminary study supports the feasibility of using machine learning to identify RRMM patients at low risk for severe CRS prior to BsAb therapy, based on routinely acquired and widely accessible laboratory parameters. The high NPV achieved by the low-risk threshold indicates that such models may be valuable in guiding triage decisions, potentially enabling safer outpatient management for selected patients. However, given the modest sample size and limited sensitivity in the high-risk category, these findings should be interpreted with caution. Future work should focus on validating this approach in larger, prospective, and multi-center cohorts to assess its generalizability and refine its clinical applicability.
This feature is available to Subscribers Only
Sign In or Create an Account Close Modal